Skip to content

BUG: rolling with MSVC 2017 build #21813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 26, 2018
Merged

Conversation

chris-b1
Copy link
Contributor

@chris-b1 chris-b1 commented Jul 8, 2018

For now just trying 3.7 on appveyor, not sure if all necessary conda packages are there

appveyor.yml Outdated
PYTHON_VERSION: "3.7"
PYTHON_ARCH: "64"
CONDA_PY: "37"
CONDA_NPY: "113"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this 114?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also very likely won't work, very little is built for 3.7 :<

@TomAugspurger
Copy link
Contributor

The c3i_test channel may have some missing packages.

@codecov
Copy link

codecov bot commented Jul 8, 2018

Codecov Report

Merging #21813 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21813      +/-   ##
==========================================
+ Coverage   91.99%      92%   +<.01%     
==========================================
  Files         167      170       +3     
  Lines       50578    50553      -25     
==========================================
- Hits        46530    46510      -20     
+ Misses       4048     4043       -5
Flag Coverage Δ
#multiple 90.4% <ø> (ø) ⬆️
#single 42.21% <ø> (+0.04%) ⬆️
Impacted Files Coverage Δ
pandas/io/formats/console.py 65.15% <0%> (-4.55%) ⬇️
pandas/core/frame.py 97.19% <0%> (ø) ⬆️
pandas/core/generic.py 96.47% <0%> (ø) ⬆️
pandas/core/internals/blocks.py 94.46% <0%> (ø)
pandas/core/internals/concat.py 98.37% <0%> (ø)
pandas/core/internals/managers.py 96.52% <0%> (ø)
pandas/core/indexes/base.py 96.37% <0%> (ø) ⬆️
pandas/core/arrays/categorical.py 95.95% <0%> (ø) ⬆️
pandas/io/formats/excel.py 97.39% <0%> (+0.01%) ⬆️
pandas/core/reshape/merge.py 94.16% <0%> (+0.05%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0828c25...5d40eae. Read the comment docs.

@chris-b1
Copy link
Contributor Author

chris-b1 commented Jul 9, 2018

Well, good news is we got a repro of this on our CI - it's not a python 3.7 issue, but instead a Visual Studio 2017 one.
https://ci.appveyor.com/project/pandas-dev/pandas/build/1.0.15961/job/r9axp415no6jubnr#L2059

Still not convinced this isn't a compiler bug, but having trouble to get a simple case to reproduce - one option would be to build our wheels VS 2015 - but annoying, b/c python forces the newest version, so can't have 2017 installed. I'll keep trying some things.

@chris-b1 chris-b1 changed the title BUG: rolling with 3.7 and windows BUG: rolling with MSVC 2017 build Jul 11, 2018
@chris-b1
Copy link
Contributor Author

This is kind of horrible, but it does fix the problem (just a lint issue I pushed for) and I'm out of ideas - any thoughts or suggestions @jreback, @TomAugspurger ?

@@ -654,16 +654,21 @@ cdef inline void add_var(double val, double *nobs, double *mean_x,
double *ssqdm_x) nogil:
""" add a value from the var calc """
cdef double delta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing

if val != val:
    return

# rest of code

might work (and be more clear here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion, sadly didn't work

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Numeric Operations Arithmetic, Comparison, and Logical operations Windows Windows OS labels Jul 11, 2018
@TomAugspurger
Copy link
Contributor

@chris-b1 could you summarize the issue? You think MSVC is just optimizing away if val == val, which isn't true when val is NaN?

@chris-b1
Copy link
Contributor Author

That's right, optimizing maybe not the right word, but somehow the add_var function is being turned into a no-op, the mean_x, ssqdm_x, and nobs variables don't get updated. Adding any kind of unconditional action to the function fixes it - originally discovered by noticing it works with a print statement thrown in.

And yes, val==val is a not-a-NaN check. We use it all over the place in this module, so it's not as if that check is universally being treated wrong, but something about this particular context makes MSVC think it's unreachable or something.

@TomAugspurger
Copy link
Contributor

Ok, thanks...

@jjhelmus do we know anyone at Microsoft who works on their compilers that might be able to provide insight on #21813 (comment) and #21813 (comment)?

@jjhelmus
Copy link
Contributor

@zooba might be able to connect you to the right team at Microsoft

@zooba
Copy link
Contributor

zooba commented Jul 13, 2018

I can try, but I haven't been having as much luck recently.

Since this is Cython, can someone post the generated code for the comparison? There's a much better chance of getting something useful if they're looking at the code they actually compile, and not the code someone else is compiling.

@chris-b1
Copy link
Contributor Author

It's Cython, so of course the bigger file is a tangled mess, but the particular function is easy to read - lines of gist here.
https://gist.github.com/chris-b1/cd974fb6b4165285a48b6d6c12129b40#file-window-cpp-L11028-L11096

Called in a few places, but for instance, here is a problem one.
https://gist.github.com/chris-b1/cd974fb6b4165285a48b6d6c12129b40#file-window-cpp-L11948

Cython isn't doing anything particular interesting - it's a cdef function so basically translates line for line into c++.

I can post it later, not on this machine, but I've also tried to write a simpler, plain c++ example that triggers the problem, but it always worked - really not sure why.

@zooba
Copy link
Contributor

zooba commented Jul 13, 2018

I can reproduce it with this snippet (has a few tricks to try and defeat optimisations, but they are likely unnecessary):

#include <math.h>
#include <stdio.h>

double get_value(int i) {
    return i > 1 ? NAN : 7;
}

int check_default(double v1, double v2) {
    return ((v1 == v2) != 0);
}

#pragma float_control(precise, off, push)
int check_fast(double v1, double v2) {
    return ((v1 == v2) != 0);
}
#pragma float_control(pop)


int main(int argc, char **argv) {
    double x1 = get_value(argc);
    double x2 = get_value(argc);
    fprintf(stdout, "default: %d\nfast:    %d\n",
        check_default(x1, x2), check_fast(x1, x2)
    );
}

The comparison doesn't behave properly when you disable precise floating point operations. You may be doing this in the build with /fp:fast (if you build this snippet with that option, both checks return '1'; if you build with the default /fp:precise, it works). In fact, the docs for this flag specifically call out NaN comparisons as something that only work in precise mode.

So I'd guess you need to update the compiler flags for this one, or possibly add an explicit isnan() call.

@jbrockmendel
Copy link
Member

Is the MSVC info something that can be inferred from platform.uname (at compile-time)?

@chris-b1
Copy link
Contributor Author

Thanks for taking a look at this @zooba - unfortunately that's not exactly the same problem I'm seeing here, the compiled code is acting as if (v1 == v2) always evaluates to False/0. We are building with /fp:precise (via defaults, but I've also tried manually overriding)

@chris-b1
Copy link
Contributor Author

Interestingly using isnan from <math.h> does solve the problem here - still feels like MSVC is doing something wrong in the former case, but that's a reasonable enough workaround.

@zooba
Copy link
Contributor

zooba commented Jul 16, 2018

the compiled code is acting as if (v1 == v2) always evaluates to False/0

Do you have a disassembly showing this? This would be very strange to see as a compiler optimization (unless you mean (v == v) always evaluates to True/1?)

b/c python forces the newest version, so can't have 2017 installed

Just saw this while rereading the thread. If you start with all the tools on PATH and set DISTUTILS_USE_SDK=1 then it'll skip the autodetection and use whatever you provide. (And if you help encourage the ecosystem to get off distutils completely, then more of these problems go away :) )

@jbrockmendel
Copy link
Member

@chris-b1
Copy link
Contributor Author

@zooba - do you know how to pull symbols into the dissambly from a .pyd? I was playing around with dumpbin / compile flags, but never got anything better than, e.g. below, where everything is still in offsets.

 dumpbin.exe /DISASM pandas\_libs\window.cp36-win_amd64.pyd /OUT:tmp.txt

image

@chris-b1
Copy link
Contributor Author

do you know how to pull symbols into the dissambly from a .pyd

Answering my own question - I was missing '/DEBUG' on extra_link_args

@chris-b1
Copy link
Contributor Author

Alright @zooba - I think I have this to a reasonable bit of a dis-assembly - pushing the bounds of my knowledge but looks like a code-generation bug.

Here's MSVC 2015 (printfs surround a call to add_var) which works correctly. In particular the highlighted lines are the NaN check.
https://gist.github.com/chris-b1/1049fdabafc56b2a25de7ad377016baa#file-msvc2015-asm-L6-L8

And here is MSVC 2017. The function call doesn't get inlined - half of the NaN check gets done (jn) - then the stack is being set up, then the other half the check (jne). I believe in setting up the stack, the ZF flag originally set by ucomisd gets set back to 0 (from the sub) so the jne check will always be true, and the function unconditionally exits early.
https://gist.github.com/chris-b1/199f818c1fbb8f8c4108aa9e430df4f7#file-msvc2017-asm-L15-L19

@zooba
Copy link
Contributor

zooba commented Jul 16, 2018

@chris-b1 Yep, you're absolutely right. Thanks for tracking that down, I'll pass it on!

@zooba
Copy link
Contributor

zooba commented Jul 16, 2018

Here is the public side of the report: https://developercommunity.visualstudio.com/content/problem/294290/incorrect-codegen-for-double-equality.html Feel free to add pings there whenever you like

@chris-b1
Copy link
Contributor Author

Thanks @zooba!

and thanks @jbrockmendel - that's what I'll have to do to fix the 2.7 build, more modern MSVCs are basically standards compliant and have isnan where it should be.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2018

does using the fix proposed in #21940 help here? e.g. npy_isnan?

@chris-b1
Copy link
Contributor Author

does using the fix proposed in #21940 help here? e.g. npy_isnan?

It does (latest commit has that and is passing on appveyor) - but less sure we want to apply that universally, I'll try perf on @jbrockmendel PR later, suspect it may be worse on Windows.

@jreback jreback added this to the 0.23.4 milestone Jul 25, 2018
@jreback
Copy link
Contributor

jreback commented Jul 25, 2018

@chris-b1 this looks good. can you rebase once more.

perf looks ok? (even if its slightly worse this is a bug :>)

Go ahead and merge on green.

@chris-b1
Copy link
Contributor Author

@jreback thanks, yeah perf will be fine, just slightly slower on Windows for this particular function

@chris-b1 chris-b1 merged commit 7a2fbce into pandas-dev:master Jul 26, 2018
@lumberbot-app
Copy link

lumberbot-app bot commented Jul 26, 2018

There seem to be a conflict, please backport manually

minggli added a commit to minggli/pandas that referenced this pull request Jul 28, 2018
* master:
  BENCH: asv csv reading benchmarks no longer read StringIO objects off the end (pandas-dev#21807)
  BUG: df.agg, df.transform and df.apply use different methods when axis=1 than when axis=0 (pandas-dev#21224)
  BUG: bug in GroupBy.count where arg minlength passed to np.bincount must be None for np<1.13 (pandas-dev#21957)
  CLN: Vbench to asv conversion script (pandas-dev#22089)
  consistent docstring (pandas-dev#22066)
  TST: skip pytables test with not-updated pytables conda package (pandas-dev#22099)
  CLN: Remove Legacy MultiIndex Index Compatibility (pandas-dev#21740)
  DOC: Reword doc for filepath_or_buffer in read_csv (pandas-dev#22058)
  BUG: rolling with MSVC 2017 build (pandas-dev#21813)
@TomAugspurger TomAugspurger mentioned this pull request Jul 30, 2018
@jreback
Copy link
Contributor

jreback commented Jul 30, 2018

cc @chris-b1 do you want to see if you can backprot to 23.x branch?

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Aug 2, 2018
* Appveyor 3.7

* ci package list

* change image type

* try hack fix

* lint

* use isnan on problem function

* use numpy compat isnan

* use right isnan

* work around OSX math undefs

* cleanup const

* fix reversion

* ...

(cherry picked from commit 7a2fbce)
TomAugspurger pushed a commit that referenced this pull request Aug 3, 2018
* Appveyor 3.7

* ci package list

* change image type

* try hack fix

* lint

* use isnan on problem function

* use numpy compat isnan

* use right isnan

* work around OSX math undefs

* cleanup const

* fix reversion

* ...

(cherry picked from commit 7a2fbce)
@WillAyd WillAyd added the Window rolling, ewma, expanding label Sep 4, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
* Appveyor 3.7

* ci package list

* change image type

* try hack fix

* lint

* use isnan on problem function

* use numpy compat isnan

* use right isnan

* work around OSX math undefs

* cleanup const

* fix reversion

* ...
@TomAugspurger TomAugspurger mentioned this pull request Oct 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode Window rolling, ewma, expanding Windows Windows OS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.rolling().std() only returns NaN in Python3.7
8 participants